Search CORE

4 research outputs found

Building MDE cloud services with DISTIL

Author: Carrascal Manzanares Carlos
Lara Juan de
Sánchez Cuadrado Jesús
Publication venue: CEUR-WS
Publication date: 01/01/2015
Field of study

Also published online by CEUR Workshop Proceedings (CEUR-WS.org, ISSN 1613-0073) Model-Driven Engineering (MDE) techniques, like transformations, queries, and code generators, were devised for local, single-CPU architectures. However, the increasing complexity of the systems to be built and their high demands in terms of computation, memory and storage, requires more scalable and flexible MDE techniques, likely using services and the cloud. Nonetheless, the cost of developing MDE solutions on the cloud is high without proper automation mechanisms. In order to alleviate this situation, we present DISTIL, a domain-specific language to describe MDE services, which is able to generate (NoSQL-based) respositories for the artefacts of interest, and skeletons for (single or composite) services, ready to be deployed in Heroku. We illustrate the approach through the construction of a repository and a set of cloud-based services for bent ¯o reusable transformation components.Work supported by the Spanish Ministry of Economy and Competitivity (TIN2011-24139, TIN2014-52129-R), the EU commission (FP7-ICT-2013-10, #611125) and the Community of Madrid (S2013/ICE-3006

Biblos-e Archivo

Parallélisation d’un code éléments finis spectraux. Application au contrôle non destructif par ultrasons

Author: Carrascal Manzanares Carlos
Publication venue: HAL CCSD
Publication date: 08/02/2019
Field of study

The subject of this thesis is to study numerous ways to optimize the high order spectral finite element method (SFEM) computation time. The goal is to improve performance based on easily accessible architectures, namely SIMD multicore processors and graphics processors. As the computational kernels are limited by memory accesses (indicating a low arithmetic intensity), most of the optimizations presented are aimed at reducing and accelerating memory accesses. Improved matrix and vectors indexing, a combination of loop transformations, task parallelism (multithreading) and data parallelism (SIMD instructions) are transformations aimed at cache memory optimal use, registers intensive use and multicore SIMD parallelization. The results are convincing: the proposed optimizations increase the performance (between x6 and x11) and speed up the computation (between x9 and x16). The SIMDized implementation is up to x4 better than the vectorized implementation. The GPU implementation is between two and three times faster than the CPU one, knowing that a NVLink high-speed connection will allow a correct masking of memory transfers. The proposed transformations form a methodology to optimize intensive computation codes on common architectures and to make the most of the possibilities offered by multithreading and SIMD instructions.Le sujet de cette thèse consiste à étudier diverses pistes pour optimiser le temps de calcul de la méthode des éléments finis spectraux d'ordre élevé (SFEM). L’objectif est d’améliorer la performance en se basant sur des architectures facilement accessibles, à savoir des processeurs multicœurs SIMD et des processeurs graphiques. Les noyaux de calcul étant limités par les accès mémoire (signe d’une faible intensité arithmétique), la plupart des optimisations présentées visent la réduction et l’accélération des accès mémoire. Une indexation améliorée des matrices et vecteurs, une combinaison des transformations de boucles, un parallélisme de tâches (multithreading) et un parallélisme de données (instructions SIMD) sont les transformations visant l’utilisation optimale de la mémoire cache, l’utilisation intensive des registres et la parallélisation multicœur SIMD. Les résultats sont concluants : les optimisations proposées augmentent la performance (entre x6 et x11) et accélèrent le calcul (entre x9 et x16). L’implémentation codée explicitement avec des instructions SIMD est jusqu’à x4 plus performant que l’implémentation vectorisée. L’implémentation GPU est entre 2 et 3 fois plus rapide qu’en CPU, sachant qu’une connexion haut débit NVLink permettrait un meilleur masquage des transferts de mémoire. Les transformations proposées par cette thèse composent une méthodologie pour optimiser des codes de calcul intensif sur des architectures courantes et pour tirer parti au maximum des possibilités offertes par le multithreading et les instructions SIMD

Thèses en Ligne

Spectral finite element code parallelization. Application to non-destructive ultrasonic testing.

Author: Carrascal Manzanares Carlos
Publication venue
Publication date: 08/02/2019
Field of study

Le sujet de cette thèse consiste à étudier diverses pistes pour optimiser le temps de calcul de la méthode des éléments finis spectraux d'ordre élevé (SFEM). L’objectif est d’améliorer la performance en se basant sur des architectures facilement accessibles, à savoir des processeurs multicœurs SIMD et des processeurs graphiques. Les noyaux de calcul étant limités par les accès mémoire (signe d’une faible intensité arithmétique), la plupart des optimisations présentées visent la réduction et l’accélération des accès mémoire. Une indexation améliorée des matrices et vecteurs, une combinaison des transformations de boucles, un parallélisme de tâches (multithreading) et un parallélisme de données (instructions SIMD) sont les transformations visant l’utilisation optimale de la mémoire cache, l’utilisation intensive des registres et la parallélisation multicœur SIMD. Les résultats sont concluants : les optimisations proposées augmentent la performance (entre x6 et x11) et accélèrent le calcul (entre x9 et x16). L’implémentation codée explicitement avec des instructions SIMD est jusqu’à x4 plus performant que l’implémentation vectorisée. L’implémentation GPU est entre 2 et 3 fois plus rapide qu’en CPU, sachant qu’une connexion haut débit NVLink permettrait un meilleur masquage des transferts de mémoire. Les transformations proposées par cette thèse composent une méthodologie pour optimiser des codes de calcul intensif sur des architectures courantes et pour tirer parti au maximum des possibilités offertes par le multithreading et les instructions SIMD.The subject of this thesis is to study numerous ways to optimize the high order spectral finite element method (SFEM) computation time. The goal is to improve performance based on easily accessible architectures, namely SIMD multicore processors and graphics processors. As the computational kernels are limited by memory accesses (indicating a low arithmetic intensity), most of the optimizations presented are aimed at reducing and accelerating memory accesses. Improved matrix and vectors indexing, a combination of loop transformations, task parallelism (multithreading) and data parallelism (SIMD instructions) are transformations aimed at cache memory optimal use, registers intensive use and multicore SIMD parallelization. The results are convincing: the proposed optimizations increase the performance (between x6 and x11) and speed up the computation (between x9 and x16). The SIMDized implementation is up to x4 better than the vectorized implementation. The GPU implementation is between two and three times faster than the CPU one, knowing that a NVLink high-speed connection will allow a correct masking of memory transfers. The proposed transformations form a methodology to optimize intensive computation codes on common architectures and to make the most of the possibilities offered by multithreading and SIMD instructions

Theses.fr

A fast implementation of a spectral finite elements method on CPU and GPU applied to ultrasound propagation

Author: Bergeaud Vincent
CARRASCAL-MANZANARES Carlos
Imperiale Alexandre
Lacassagne Lionel
Rougeron Gilles
Publication venue: 'IOS Press'
Publication date: 12/09/2017
Field of study

International audienceIn this paper we present an optimization of a spectral finite element method implementation. The improvements consisted in the modification of thememory layout of the main algorithmic kernels and in the augmentation of the arithmetic intensity via loop transformations. The code has been deployed on multi-core SIMD machines and GPU. Compared to our starting point, i.e. the original scalar sequential code, we achieved a speed up of ×228 on CPU.We present comparisons with the SPECFEM2D code that prove the good performances of our implementation on similar cases. On GPU, a hybrid solution is investigated

HAL-CEA